Implementation of the validation testing in MPPG 5.a “Commissioning and QA of treatment planning dose calculations–megavoltage photon and electron beams”

Abstract The AAPM Medical Physics Practice Guideline (MPPG) 5.a provides concise guidance on the commissioning and QA of beam modeling and dose calculation in radiotherapy treatment planning systems. This work discusses the implementation of the validation testing recommended in MPPG 5.a at two institutions. The two institutions worked collaboratively to create a common set of treatment fields and analysis tools to deliver and analyze the validation tests. This included the development of a novel, open‐source software tool to compare scanning water tank measurements to 3D DICOM‐RT Dose distributions. Dose calculation algorithms in both Pinnacle and Eclipse were tested with MPPG 5.a to validate the modeling of Varian TrueBeam linear accelerators. The validation process resulted in more than 200 water tank scans and more than 50 point measurements per institution, each of which was compared to a dose calculation from the institution's treatment planning system (TPS). Overall, the validation testing recommended in MPPG 5.a took approximately 79 person‐hours for a machine with four photon and five electron energies for a single TPS. Of the 79 person‐hours, 26 person‐hours required time on the machine, and the remainder involved preparation and analysis. The basic photon, electron, and heterogeneity correction tests were evaluated with the tolerances in MPPG 5.a, and the tolerances were met for all tests. The MPPG 5.a evaluation criteria were used to assess the small field and IMRT/VMAT validation tests. Both institutions found the use of MPPG 5.a to be a valuable resource during the commissioning process. The validation testing in MPPG 5.a showed the strengths and limitations of the TPS models. In addition, the data collected during the validation testing is useful for routine QA of the TPS, validation of software upgrades, and commissioning of new algorithms.

respectively, to provide comprehensive guidelines for acceptance testing, commissioning, and ongoing quality assurance of 3D TPS.
Starkschall et al. 5 described a beam modeling methodology for the convolution/superposition dose calculation algorithm in the Pinnacle TPS, including an assessment of model accuracy using the recommended procedures in TG-53. In 2001, Venselaar et al. 6 proposed a set of tests and appropriate tolerances for photon beam dose calculations. In addition, validation tests for nonstandard treatment geometries, inhomogeneous media, MLC modeling, and commissioning have been described. [7][8][9] The 2008 AAPM Task Group Report No. 106 10 discusses equipment and procedures to ensure the accurate and self-consistent collection of commissioning beam data. The 2009 AAPM Task Group Report No. 119 11 provides guidance and test cases for IMRT commissioning.
A number of previous studies discussed the development of custom tools for automated analysis of commissioning data. Adnani 12 designed a TG-106 compliant linear accelerator data management system for physics data acquisition, processing, and validation. Birgani et al. 13 created a MATLAB program for comparing commissioning measurements and dose distributions from a custom-made second-check software. Bergman et al. 14 used MATLAB to perform an automated 3D gamma analysis comparing treatment planning system and Monte Carlo dose distributions for the HD120 MLC on a Varian TrueBeam linear accelerator. Several authors have reported developing in-house software for comparing commissioning data to radiochromic film measurements. [15][16][17] Previous research discussing tools for the analysis of arbitrary validation fields are much less common. Jacqmin et al. 18 created a comprehensive and efficient system to validate photon dose calculation algorithm accuracy based on the tests recommended in TG-53. Kim et al. 19 developed an automated quality assurance procedure for the Pinnacle TPS that assessed beam model accuracy for commissioning beam geometries and additional clinical scenarios. Although all of these tools have contributed greatly to the automation of many commissioning tasks, none were specifically tailored to analyze data captured for MPPG 5.a. We therefore chose to create a new open-source software tool to share with others who will perform the MPPG 5.a tests. An accompanying spreadsheet is also included to aid the user in analyzing and organizing the results of the MPPG 5.a tests.
The first objective of this work is to report our experience imple- The second objective of this work is to present methods and tools that were created to facilitate the delivery and analysis of the validation tests. A set of treatment fields and corresponding multileaf collimator (MLC) patterns, scan queues, and an open-source MATLAB program were designed. These tools and materials can be disseminated to the physics community to aid with the implementation of the MPPG 5.a guideline.

| METHODS
The MPPG 5.a report intentionally allows for flexibility in data acquisition, tools, and processes. The measured data can be acquired with a variety of detectors in solid phantoms, planar or volumetric QA devices, or a scanning water tank. The measurement tools used for this project are summarized in Table 1

2.A | Preparation
Dose calculations were performed at both institutions using the same virtual water tank (a cube of water created in the Eclipse TPS at MUSC). The use of the same virtual water phantom improved the coordination of test planning and data analysis between our centers. For tests 6.2 and 8.3, simple custom phantoms were created from slabs of Gammex RMI Model 457 Solid Water (Gammex RMI, Middleton, WI, USA) and cork as illustrated in Fig. 1. The Delta4 phantom (ScandiDos AB, Uppsala, Sweden) and MapCHECK2 (Sun Nuclear Corporation, Melbourne, FL, USA) were used for the IMRT/ VMAT and TG-119 tests (7.3-7.4). The MapCHECK2 was also used for the nonphysical wedge fields in test 5.9.
The plans created in the TPS for each test were structured to make organization and data export as efficient as possible. In Pinnacle, a separate test patient was created for each of the four primary datasets: water tank, heterogeneous phantom, TG-119 simulated dataset, and IMRT device. Plans and trials were used to organize the validation tests. In Eclipse, a single test patient was created to store and organize most of the validation tests. Separate courses were created to categorize the tests based on the imaging datasets. A single plan was created in the appropriate course for each water tank test and heterogeneity test, and each plan contained beams with the same aperture but different energies. The plan organization technique has no effect on the dose calculations; it was done for plan and data management purposes only. For both TPSs, dose was calculated using a 2-mm dose grid and exported separately for each beam. The Pinnacle dose distributions were calculated with the "Adaptive Convolution" algorithm for photons and the "Electron 3D" algorithm for electrons. The Eclipse photon dose was calculated with both "AAA" and "Acuros XB." The electron dose was calculated with the "eMC" algorithm.

2.B | Measurements for profile tests
Scanning water tank measurements were used for a number of The MLC field shapes for the basic photon tests in section 5 of MPPG 5.a are illustrated in Fig. 2. The small MLC field (test 5.4) was designed to simulate a typical small, nonrectangular MLC-defined treatment field. The large MLC field (test 5.5) is a larger field with extensive MLC blocking. Test 5.6 is an off-axis field with the X-field edge defined by MLCs with maximum allowed leaf over-travel and the Y-field edge defined by the jaws. Test 5.7 is an asymmetric field measured at a short source to surface distance (SSD). An SSD of 80 cm was the closest achievable distance at both institutions. Test 5.8 is an MLC-shaped field at oblique incidence. Depending on the tank   The program has an easy-to-use graphical user interface (Fig. 5).
A sample output from the PCT is shown in Fig. 6 and includes  Fig. 3. The fields were designed to be smaller in one dimension than the smallest field size for which an output factor is defined in the TPS. This allowed us to explore the limitations of our models for small, irregular field shapes common in IMRT and VMAT.
For the verification of enhanced dynamic wedges (EDW) (test 5.9) a large MLC-shaped field, as shown in Fig. 2f,   (electrons) are used in this work. For IMRT/VMAT evaluation, we used the evaluation criteria listed in MPPG 5.a. (Table 8).

| RESULTS
The validation tests outlined in MPPG 5.a were performed at two institutions with newly created tools. The validation experience resulted in more than 200 water tank scans and more than 50 point measurements per institution. Time estimates for preparation, measurement, and analysis activities are summarized in Table 2

3.A | MPPG 5.a tests with tolerance values
The The PCT also allowed us to discover dose discrepancies in our electron models that were not uncovered using the MPPG 5.a tolerances. For example, two of the measured PDDs for the small cutout in test 8.1 showed dose discrepancies for Pinnacle beyond R 50 , which resulted in gamma passing rates below 95%. An example is shown in Fig. 8. The other PDDs for the small cutout exhibited a similar discrepancy, but the magnitude was small enough that the    (Figs 3b and 3c), neither of which provide charged particle equilibrium near the peak dose. Consequently, the measurement locations were in high-dose/high-gradient regions.
There was no corresponding tolerance or evaluation criteria listed in the guideline for this situation (  These studies report local dose differences in the out-of-field region as large as 50%. To our knowledge, our study is the first to indicate the underestimation of out-of-field dose may be worse for Eclipse Acuros XB than AAA. Accurate modeling of the out-of-field dose can be critical when calculating dose for certain treatment situations, including fetal dose or implantable cardiac devices.
Test 6.1 verified that the HU-value-to-density calibration was appropriately applied for both TPS. Test 6.2, which validates heterogeneity corrections for photon beams, was a simple and useful test. The Eclipse AAA and Pinnacle CS algorithms were accurate to within 2% for all energies both above and below the heterogeneity.
Due to the unique characteristics of Acuros XB, results of test 6.2 depend upon whether the phantom materials were overridden to non-biological materials in the TPS. The Eclipse Acuros XB algorithm assigns biological materials to each voxel in the CT dataset. 24 When the dose calculation was performed on the phantom using the doseto-medium reporting method and no material overrides, the dose differences exceeded 2%. For this dose calculation method, the dose at the ion chamber locations was calculated in a combination of "skeletal muscle" and "cartilage" materials that are automatically assigned by Acuros XB. The maximum difference between Acuros XB and the measured dose was 2.2% above the heterogeneity and 2.6% below the heterogeneity. The results improved significantly when the Solid Water and cork regions in the TPS were overridden to the "water" and "cork" materials, respectively. The maximum difference between Acuros XB and the measured dose was 1.3% above the heterogeneity and 0.6% below the heterogeneity when material overrides were used in the dose calculation. We learned that water-equivalent phantoms should be manually overridden to the "water" material for accurate dose calculation when using Acuros XB.
Pinnacle and Eclipse performed well on the small static field vali- For users who wish to scrutinize their TPS models further, we recommend comparing your results to previous work on the topic of validation found in the literature. [2][3][4][5][6][7][8][9][10][11] Finally, we wanted to note what appears to be an error in MPPG 5.a to future users of the guideline. Table 8 in MPPG 5.a uses the term "tolerance" when it is clear from the text of the guideline (particularly section 1B) that the values are meant to be evaluation criteria. We suggest this column heading be changed to "evaluation criteria" to more accurately reflect the intent of the more rigorous IMRT/VMAT gamma criteria.
We recommend a slightly different organization of the tests in

| CONCLUSION
Implementing the validation tests in MPPG 5.a was a valuable exercise. While it represents a significant time commitment, the resulting infrastructure has been useful for subsequent software and hardware validation tests, as well as routine QA at our institutions. We have presented tools and processes to efficiently perform the MPPG 5.a tests and organize the results. Most of the tools and processes that we used are applicable, or easily adaptable, to other radiation oncology clinics. In particular, the Profile Comparison Tool was designed to be as flexible as possible and work with a number of treatment planning systems and scanning water tank configurations. It is our hope that with only minimal adaptation, the amount of time needed to implement MPPG 5.a testing will be significantly decreased by using the tools we have shared. Overall, MPPG 5.a made QA of TPS easier, more robust, and more uniform across the clinic.
The MPPG 5.a tests serve as an opportunity for physicists to interrogate their beam models in a wide variety of geometries and learn if there are particular geometries in which their beam models perform poorly. The discovery of poorly modeled geometries provides useful information for physicists, who can advise against certain plan designs or perform patient-specific measurements for certain delivery parameter combinations.

ACKNOWLEDGMEN TS
The

CONF LICT OF I NTEREST
The authors have no conflicts of interest to disclose.